U.S. Performance Across International 
Assessments of Student Achievement 

Special Supplement to The Condition of Education 2009 




NCES 2009-083 

U.S. DEPARTMENT OF EDUCATION 




NATIONAL CENTER for 
EDUCATION STATISTICS 



Institute of Education Sciences 



U.S. Performance Across 
International Assessments of 
Student Achievement 

Special Supplement to The Condition of Education 2009 
AUGUST 2009 



Stephen Provasnik 
Patrick Gonzales 

National Center for Education Statistics 

David Miller 

American Institutes for Research 



NCES 2009-083 

U.S. DEPARTMENT OF EDUCATION 




NATIONAL CENTER for 
EDUCATION STATISTICS 



Institute of Education Sciences 



U.S. Department of Education 

Arne Duncan 
Secretary 

Institute of Education Sciences 

John Easton 
Director 

National Center for Education Statistics 

Stuart Kerachsky 
Acting Commissioner 

The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and 
reporting data related to education in the United States and other nations. It fulfills a congressional mandate to collect, 
collate, analyze, and report full and complete statistics on the condition of education in the United States; conduct and 
publish reports and specialized analyses of the meaning and significance of such statistics; assist state and local education 
agencies in improving their statistical systems; and review and report on education activities in foreign countries. 

NCES activities are designed to address high-priority education data needs; provide consistent, reliable, complete, 
and accurate indicators of education status and trends; and report timely, useful, and high-quality data to the U.S. 
Department of Education, the Congress, the states, other education policymakers, practitioners, data users, and the 
general public. Unless specifically noted, all information contained herein is in the public domain. 

We strive to make our products available in a variety of formats and in language that is appropriate to a variety of 
audiences. You, as our customer, are the best judge of our success in communicating information effectively. If you have 
any comments or suggestions about this or any other NCES product or report, we would like to hear from you. Please 
direct your comments to: 

National Center for Education Statistics 
Institute of Education Sciences 
U.S. Department of Education 
1990 K Street NW 
Washington, DC 20006-5651 

August 2009 

The NCES World Wide Web Home Page address is http://nces.ed.gov . 

The NCES World Wide Web Electronic Catalog address is http://nces.ed.gov/pubsearch . 

Suggested Citation 

Provasnik, S., Gonzales, P., and Miller, D. (2009). U.S. Performance Across International Assessments of Student 
Achievement: Special Supplement to The Condition of Education 2009 (NCES 2009-083). National Center for Education 
Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC. 

For ordering information on this report, write to 

U.S. Department of Education 
ED Pubs 
P.O. Box 1398 
Jessup, MD 20794-1398 

or call toll free 1-877-4ED-PUBS or order online at http://www.edpubs.org . 

Content Contact 

Stephen Provasnik 
(202) 502-7480 
Stephen.Provasnik(2>ed.gov 



Executive Summary 



The Condition of Education summarizes important 
developments and trends in education using the latest 
available data. The report, which the National Center 
for Education Statistics (NCES) is required by law to 
produce, is an indicator report intended for a general 
audience of readers who are interested in education. The 
indicators represent a consensus of professional judgment 
on the most significant national measures of the condition 
and progress of education for which accurate data are 
available. For the 2009 edition, NCES prepared a special 
analysis to take a closer look at U.S. student performance 
on international assessments. 

This special analysis looks at information gathered from 
recent international studies that U.S. students have 
participated in: the Progress in International Reading 
Literacy Study (PIRLS), the Program for International 
Student Assessment (PISA), and the Trends in 
International Mathematics and Science Study (TIMSS). 
PIRLS, sponsored by the International Association for 
the Evaluation of Educational Achievement (IEA) and 
first conducted in 2001, assesses the reading performance 
of 4th-graders every 5 years. PISA, sponsored by 
the Organization for Economic Cooperation and 
Development (OECD) and first conducted in 2000, 
assesses the reading, mathematics, and science literacy 
of 15-year-old students every 3 years. And TIMSS, 
sponsored by the IEA and first conducted in 1995, 
assesses the mathematics and science performance of both 
4th- and 8th-graders every 4 years. Not all countries 1 have 
participated in all three studies or in all administrations 
of a single study’s assessments. All three studies include 
both developed and developing countries; however, 
TIMSS and PIRLS have a larger proportion of developing 
countries participating than PISA does because PISA 
is principally a study of the member countries of the 
OECD — an intergovernmental organization of 30 
developed countries. 

This special analysis examines the performance of U.S. 
students in reading, mathematics, and science compared 
with the performance of their peers in other countries 
that participated in PIRLS, PISA, and TIMSS. It 
identifies which of these countries have outperformed the 
United States, in terms of students’ average scores and the 



1 The term “country” is used throughout this special analysis as a common 
name for the range of political entities that have participated in each study. 
In most cases, participating countries represent an entire nation state, as 
in the case of the United States. However, in some studies, participating 
countries represent parts of nation states. For example, several Canadian 
provinces participated separately in PIRLS 2006, instead of Canada. Like- 
wise, England and Scotland regularly participate separately (instead of the 
entire United Kingdom) and Belgium regularly participates as two units 
(Flemish-speaking and French-speaking Belgium) in PIRLS and TIMSS. 
Similarly, Hong Kong and Macao, which are special administrative regions 
(SAR) of China, also participate independently. 



percentage of students reaching internationally 

benchmarked performance levels, and which countries 

have done so consistently. 

Major findings include: 

Reading 

■ In PIRLS 2006, the average U.S. 4th-graders’ 
reading literacy score (540) was above the PIRLS 
scale average of 500, but below that of 4th-graders 
in 10 of the 45 participating countries, including 
3 Canadian provinces (Russian Federation, Hong 
Kong, Alberta, British Columbia, Singapore, 
Luxembourg, Ontario, Hungary, Italy, and Sweden). 

■ Among the 28 countries that participated in both 
the 2001 and 2006 PIRLS assessments, the average 
reading literacy score increased in 8 countries 

and decreased in 6 countries. In the rest of these 
countries, including the United States, there was no 
measurable change in the average reading literacy 
score between 2001 and 2006. The number of these 
countries that outperformed the United States 
increased from 3 in 2001 to 7 in 2006. 

Mathematics 

■ The 2007 TIMSS results showed that U.S. students’ 
average mathematics score was 529 for 4th-graders 
and 508 for 8th-graders. Both scores were above the 
TIMSS scale average, which is set at 500 for every 
administration of TIMSS at both grades, and both 
were higher than the respective U.S. score in 1995. 

■ Fourth-graders in 8 of the 35 other countries 
that participated in 2007 (Hong Kong, Singa- 
pore, Chinese Taipei, Japan, Kazakhstan, 
Russian Federation, England, and Latvia) 
scored above their U.S. peers, on average; and 
8th-graders in 5 of the 47 other countries that 
participated in 2007 (Chinese Taipei, Korea, 
Singapore, Hong Kong, and Japan) scored 
above their U.S. peers, on average. 

■ Among the 16 countries that participated 
in both the first TIMSS in 1995 and the 
most recent TIMSS in 2007, at grade 4, the 
average mathematics score increased in 8 
countries, including in the United States, 
and decreased in 4 countries. Among the 20 
countries that participated in both the 1995 
and 2007 TIMSS at grade 8, the average 
mathematics score increased in 6 countries, 
including in the United States, and decreased 
in 10 countries. 



Special Supplement to The Condition of Education iii 



■ In PISA 2006, U.S. 15-year-old students’ average 
mathematics literacy score of 474 was lower than 
the OECD average of 498, and placed U.S. 15-year- 
olds in the bottom quarter of participating OECD 
nations, a relative position unchanged from 2003. 

■ Fifteen-year-old students in 23 of the 29 
other participating OECD-member countries 
outperformed their U.S. peers. 

■ There was no measurable change in U.S. 
15-year-olds’ average mathematics literacy 
score between 2003 and 2006, in its 
relationship to the OECD average, or in its 
relative position to the countries whose scores 
increased or decreased. 

Science 

■ The 2007 TIMSS results showed that U.S. students’ 
average science score was 539 for 4th-graders and 
520 for 8th-graders. Both scores were above the 
TIMSS scale average, which is set at 500 for every 
administration of TIMSS at both grades, but 
neither was measurably different than the respective 
U.S. score in 1995. 

■ Fourth-graders in 4 of the 35 other countries 
that participated in 2007 (Singapore, Chinese 
Taipei, Hong Kong, and Japan) scored above 
their U.S. peers, on average; and 8th-graders 



in 9 of the 47 other countries that partici- 
pated in 2007 (Singapore, Chinese Taipei, 
Japan, Korea, England, Hungary, the Czech 
Republic, Slovenia, and the Russian Federa- 
tion) scored above their U.S. peers, on average. 

■ While there was no measurable change in 
the average score of U.S. 4th-graders or 
8th-graders in science between 1995 and 
2007, among the other 15 countries that 
participated in the 1995 and 2007 TIMSS at 
grade 4, the average science score increased in 
7 countries and decreased in 5 countries; and 
among the other 18 countries that partici- 
pated in both the 1995 and 2007 TIMSS at 
grade 8, the average science score increased in 
5 countries and decreased in 3 countries. 

■ In PISA 2006, U.S. 15-year-old students’ average 
science literacy score of 489 was lower than the 
OECD average of 500, and placed U.S. 15-year-olds 
in the bottom third of participating OECD nations. 
Fifteen-year-old students in 16 of the 29 other 
participating OECD-member countries outper- 
formed their U.S. peers in terms of average scores. 

Technical notes about the data sources, methodology, 
and standard errors are included at the end of this report. 
Special analyses are available on the The Condition of 
Education website ( http://nces.ed.gov/programs/coe ). 



iv Special Supplement to The Condition of Education 2009 



Contents 



Page 

Executive Summary iii 

List of Tables vi 

List of Figures vii 

Introduction 1 

Are International Assessment Results Reliable, Valid, and Comparable? 6 

How Do U.S. Students Compare With Their Peers in Other Countries? 7 

Reading 8 

Reading results for 4th-graders 8 

Reading results for 15-year-olds 10 

Synthesis of reading results 14 

Mathematics 16 

Mathematics results for 4th- and 8th-graders 16 

Mathematics results for 15-year-olds 20 

Synthesis of mathematics results 21 

How Much Variation Is There Between Low and High Performers in Different Countries? 30 

Science 32 

Science results for 4th- and 8th-graders 32 

How Much Does Performance Within the United States Vary by School Poverty? 40 

Science results for 15-year-olds 40 

Synthesis of science results 44 

Summary 45 

References 47 

Supplemental Tables 49 

Appendix A: Technical Notes 67 



Special Supplement to The Condition of Education V 



List of Tables 

Table Page 

1. Participation in the most recent international assessments, by jurisdiction and grade or age 4 

2. Average PIRLS scores of fourth-grade students on reading literacy scale and cutpoint scores for bottom and 

top 10 percent of students in each jurisdiction, by jurisdiction: 2006 9 

3. Average PISA scores of 15-year-old students on reading literacy scale and cutpoint scores for bottom and 

top 10 percent of students in each jurisdiction, by jurisdiction: 2003 13 

4. Average TIMSS scores of fourth-grade students in mathematics and cutpoint scores for bottom and 

top 10 percent of students in each jurisdiction, by jurisdiction: 2007 18 

5. Average TIMSS scores of eighth-grade students in mathematics and cutpoint scores for bottom and 

top 10 percent of students in each jurisdiction, by jurisdiction: 2007 19 

6. Average PISA scores of 15-year-old students on mathematics literacy scale and cutpoint scores for bottom 

and top 10 percent of students in each jurisdiction, by jurisdiction: 2006 26 

7. Average TIMSS scores of fourth-grade students in science and cutpoint scores for bottom and top 10 

percent of students in each jurisdiction, by jurisdiction: 2007 33 

8. Average TIMSS scores of eighth-grade students in science and cutpoint scores for bottom and top 10 

percent of students in each jurisdiction, by jurisdiction: 2007 34 

9. Average PISA scores of 15-year-old students on science literacy scale and cutpoint scores for bottom and 

top 10 percent of students in each jurisdiction, by jurisdiction: 2006 41 

A-l. Average PIRLS scores of fourth-grade students on combined reading literacy scale, by jurisdiction: 2001 and 

2006 50 

A-2. Average PISA scores of 15-year-old students on combined reading literacy scale, by jurisdiction: 2000, 

2003, and 2006 52 

A-3. Average TIMSS mathematics scores of fourth-grade students on combined mathematics scale, by 

jurisdiction: 1995, 2003, and 2007 54 

A-4. Average TIMSS mathematics scores of eighth-grade students on combined mathematics scale, by 

jurisdiction: 1995, 1999, 2003, and 2007 56 

A-5. Average PISA scores of 15-year-old students on combined mathematics literacy scale, by jurisdiction: 2000, 

2003, and 2006 58 

A-6. Average TIMSS science scores of fourth-grade students on combined science scale, by jurisdiction: 1995, 

2003, and 2007 60 

A-7. Average TIMSS science scores of eighth-grade students on combined science scale, by jurisdiction: 1995, 

1999, 2003, and 2007 62 

A-8. Average PISA scores of 15-year-old students on combined science literacy scale, by jurisdiction: 2000, 

2003, and 2006 64 



vi Special Supplement to The Condition of Education 2009 



Page 



List of Figures 

Figure 

1. Percentage of fourth-grade students reaching the PIRLS international benchmarks in reading, by 

jurisdiction: 2006 1 1 

2. Change in average PIRLS reading literacy scores of fourth-grade students in selected jurisdictions, by 

jurisdiction: 2001 to 2006 12 

3. Percentage distribution of 15-year-old students on PISA reading literacy scale, by proficiency level and 

jurisdiction: 2003 15 

4. Change in average PISA reading literacy scores of 15-year-old students in selected jurisdictions, by 

jurisdiction: 2000 to 2003 17 

5. Percentage of fourth-grade students reaching the TIMSS international benchmarks in mathematics, by 

jurisdiction: 2007 22 

6. Percentage of eighth-grade students reaching the TIMSS international benchmarks in mathematics, by 

jurisdiction: 2007 23 

7. Change in average TIMSS mathematics scores of fourth-grade students in selected jurisdictions, by 

jurisdiction: 1995 to 2007 24 

8. Change in average TIMSS mathematics scores of eighth-grade students in selected jurisdictions, by 

jurisdiction: 1995 to 2007 25 

9. Percentage distribution of 15-year-old students on PISA mathematics literacy scale, by proficiency level and 

jurisdiction: 2006 28 

10. Change in average PISA mathematics scores of 15-year-old students in selected jurisdictions, by 

jurisdiction: 2003 to 2006 29 

A-l. Distribution of PISA scores for 15-year-old students on the mathematics literacy scale at the 10th and 

90th percentiles, by OECD jurisdiction: 2006 30 

A-2. Distribution of PISA scores for 15-year-old students on the mathematics literacy scale at the 10th and 

90th percentiles, by non-OECD jurisdiction: 2006 31 

1 1 . Percentage of fourth-grade students reaching the TIMSS international benchmarks in science, by 

jurisdiction: 2007 36 

12. Percentage of eighth-grade students reaching the TIMSS international benchmarks in science, by 

jurisdiction: 2007 37 

13. Change in average TIMSS science scores of fourth-grade students in selected jurisdictions, by jurisdiction: 

1995 to 2007 38 

14. Change in average TIMSS science scores of eighth-grade students in selected jurisdictions, by jurisdiction: 

1995 to 2007 39 

15. Percentage distribution of 15-year-old students on PISA science literacy scale, by proficiency level and 

jurisdiction: 2006 43 



Special Supplement to The Condition of Education vii 



This page intentionally left blank. 



Introduction 



Introduction 



The National Center for Education Statistics (NCES) 
is congressionally mandated to report on the state of 
education in the United States and other countries. 1 
To carry out this mission, NCES participates in 
several international assessments to measure how the 
performance of U.S. students and adults compares 
with that of their counterparts in other countries. 

This special analysis looks closely at the information 
NCES has gathered from recent international studies 
that U.S. students have participated in: the Progress 
in International Reading Literacy Study (PIRLS), the 
Program for International Student Assessment (PISA), 
and the Trends in International Mathematics and Science 
Study (TIMSS). 2 

This special analysis describes the most recent results 
from these international studies as well as trends in the 
results, when possible. It is organized by subject area into 
three parts — reading, mathematics, and science. For each 
subject area, the following topics are addressed: 

■ How does the performance of U.S. students 
compare with their peers in other countries? 

■ Which countries’ students outperform U.S. 
students, and which have done so consistently? 

■ How has the performance of U.S. students changed 
over time? 

■ To what extent has the performance of U.S. 
students changed over time relative to their peers in 
high-performing countries? 

The three international studies examined in this special 
analysis periodically measure one or more dimensions 
of the performance of students at different ages or grade 
levels. PIRLS, sponsored by the International Association 
for the Evaluation of Educational Achievement (IEA) and 
first conducted in 2001, assesses the reading performance 
of 4th-graders every 5 years. PISA, sponsored by 
the Organization for Economic Cooperation and 
Development (OECD) and first conducted in 2000, 
assesses the reading, mathematics, and science literacy of 
15-year-old students every 3 years. 3 And TIMSS, 



1 Most recently mandated in the Education Sciences Reform Act of 2002. 

2 This special analysis does not examine the results of international assess- 
ments of adult literacy, in which the United States has also participated. 
The reason for this is that the results of the 2002 Adult Literacy and 
Lifeskills Survey (ALL), the last assessment of adult literacy, have already 
been described in The Condition of Education 2006 special analysis (see 
http://nces.ed.gov/programs/coe/2006/analysis/index.asp) , and the next 
assessment, the Program for the International Assessment of Adult Com- 
petencies (PIAAC), is not scheduled to be conducted until 2011. 

3 While PISA assesses each subject area every 3 years, each assessment 
cycle focuses on one particular subject. In 2000, the focus was on reading 
literacy; in 2003, on mathematics literacy; in 2006, on science literacy. In 
2009, the focus is on reading literacy again. 



sponsored by the IEA and first conducted in 1995, 
assesses the mathematics and science performance of both 
4th- and 8th-graders every 4 years. 4 Although organized 
and run by two different international organizations, 
these three assessments all provide score results on a 
scale of 0 to 1,000, with a standard deviation of 100. 5 
However, scores from different assessment studies (e.g., 
PISA and TIMSS) cannot be compared with each other 
directly because of differences in each study’s purpose, 
subject matter, and assessed grade or age. Thus all 
comparisons in this special analysis are between countries 
that participated in the same study. 

It is important to point out here that the term “country” 
is used for simplicity’s sake throughout this special 
analysis as a common name for the range of political 
entities that have participated in each study. In most 
cases, participating countries represent an entire nation 
state, as in the case of the United States. However, in 
some studies, participating countries represent parts of 
nation states. For example, several Canadian provinces 
participated separately in PIRLS 2006, instead of 
Canada. Likewise, England and Scotland regularly 
participate separately (instead of the entire United 
Kingdom) and Belgium regularly participates as two 
units (Flemish-speaking and French-speaking Belgium) 
in PIRLS and TIMSS. Similarly, Hong Kong and Macao, 
which are special administrative regions (SAR) of China, 
also participate independently. 6 

Not all countries have participated in all three studies 
or in all administrations of a single study’s assessments. 7 
Table 1 lists the participating countries in the most 
recent administration of each assessment, and the 
supplemental tables 1-8 list participating countries in 
all administrations of the assessments. All three studies 
include both developed and developing countries; 
however, TIMSS and PIRLS have a larger proportion 
of developing countries participating than PISA does 
because PISA is principally a study of the member 
countries of the OECD — an intergovernmental 
organization of 30 developed countries. 



4 In 1995, TIMSS also assessed students at the end of secondary school: in 
some countries, these were students in grade 10, while in others these were 
students in grade 14. In the United States, 12th-graders were assessed. 

5 For details about scale scores, see appendix A. 

6 In some assessments, subnational units such as states and regions have 
been benchmarking participants either instead of or in addition to the en- 
tire nation-state. For a list of U.S. states that have participated in interna- 
tional assessments, independent of the nation as a whole, see appendix A. 

Note that official designation of participating entities may differ between 
assessments. For example, in TIMSS, the official designation for Hong 
Kong is “Hong Kong SAR,” while in PISA, it is “Hong Kong-China.” In 
the text of this special analysis, shortened forms of official designations are 
used; but in the figures and tables, the assessment’s full official designa- 
tions are used. 

7 Countries vary over time in the assessments in which they participate 
for a variety of reasons, including individual countries’ perceptions of the 
benefits and costs of each assessment, and the specific logistic challenges of 
administering the assessments. 



2 Special Supplement to The Condition of Education 2009 



Differences in the set of countries that participate in an 
assessment can affect how well the United States appears 
to do internationally when results are released. One 
reason for this is that average student performance in 
developed countries tends to be higher than in developing 
countries. As a result, the extent to which developing 
countries participate in an assessment can affect the 
international average of participating countries as well as 
the relative position of one country compared with the 
others. 8 To deal with this problem, none of the 
international assessments calculates an international 
“average” score based on results of all participating 
countries. Instead, PISA calculates an OECD average, for 
each PISA subject area, that is based only on the results 
of the OECD-member countries. All OECD-member 
countries participate in PISA; therefore, PISA ostensibly 
calculates this average based on a consistent group of 



8 Specifically, as more developing countries participate in a study, the 
lower the international average tends to be and the higher the participating 
developed countries appear to be ranked. 



countries. 9 TIMSS and PIRLS, on the other hand, do 
not calculate an average based on the results of any of the 
participating countries; they report results relative to the 
midpoint of each assessment’s reporting scale, which they 
call the “scale average.” 10 

All differences reported in this special analysis are 
statistically “measurable” or significant at a 95 percent 
level of confidence. All r-tests supporting this special 
analysis were done without adjustments for multiple 
comparisons. It is also important to note that the 
purpose of this special analysis is to provide descriptive 
information; thus, complex interactions and causal 
relationships have not been explored. Readers are 
cautioned not to make causal inferences based on the 
results presented here. 



9 While all OECD-member countries’ results are used to calculate PISA’s 
OECD average, the number of countries used to calculate this average 
has actually increased. For example, in 2000, results for The Netherlands 
were not used to calculate the average because of its low response rates. In 
addition, between 2000 and 2003, the total number of countries in the 
OECD increased from 28 to 30 when the Slovak Republic and Turkey 
joined the OECD. 

10 Although the IEA uses the label “scale average,” this is not actually a 
calculated average: it equals 300 because that is the “average” value on the 
assessment’s 1,000-point scale. For a more detailed explanation of scale 
scores and scale averages, see appendix A. 



Special Supplement to The Condition of Education 3 



Table 1. Participation in the most recent international assessments, by jurisdiction and grade or age 





PIRLS 2006 


PISA 2006 


TIMSS 200 7 1 


Jurisdiction 


4th grade 


1 5-year-olds 


4th grade 


8th grade 


OECD 3 


Australia 




• 


• 


• 


Austria 


• 


• 


• 




Belgium 




• 






Flemish • 


French • 


Canada 




• 






Alberta 


• 




o 




British Columbia 


• 




o 


o 


Nova Scotia • 


Ontario 


• 




o 


o 


Quebec 


• 




o 


o 


Czech Republic 




• 


• 


• 


Denmark 


• 


• 


• 




Finland • 


France • • 


Germany 


• 


• 


• 




Greece • 


Flungary 


• 


• 


• 


• 


Iceland • • 


Ireland • 


Italy 


• 


• 


• 


• 


Japan 




• 


• 


• 


Korea, Republic of 




• 




• 


Luxembourg • • 


Mexico • 


Netherlands 


• 


• 


• 




New Zealand 


• 


• 


• 




Norway 


• 


• 


• 


• 


Poland 


• 


• 






Portugal 




• 






Slovak Republic 


• 


• 


• 




Spain • • 


Sweden 


• 


• 


• 


• 


Switzerland • 


Turkey 




• 




• 


United Kingdom • 


England 


• 




• 


• 


Scotland 


• 




• 


• 


United States 4 


• 


• 


• 


• 


OECD country total 


19 


30 


16 


12 


Total OECD jurisdictions 


25 


30 


16 


12 




PIRLS 2006 


PISA 2006 


TIMSS 2007 2 




4th grade 


1 5-year-olds 


4th grade 


8th grade 


Non-OECD 

Algeria 






• 


• 


Argentina 




• 






Armenia 






• 


• 


Azerbaijan 




• 






Bahrain 








• 


Bosnia and Flerzegovina 








• 


Botswana 








• 


Brazil • 


Bulgaria 


• 


• 




• 


Chile 




• 






Chinese Taipei 


• 


• 


• 


• 


Colombia 




• 


• 


• 


Croatia • 


Cyprus 








• 



See notes at end of table. 



4 Special Supplement to The Condition of Education 2009 



Table 1 . Participation in the most recent international assessments, by jurisdiction and grade or age— Continued 





PIRLS 2006 


PISA 2006 


TIMSS 2007 2 


4th grade 


1 5-year-olds 


4th grade 


8th grade 


Non-OECD— Continued 


Egypt 








• 


El Salvador 






• 


• 


Estonia 




• 






Georgia 


• 




• 


• 


Ghana 








• 


Hong Kong-China 


• 


• 


• 


• 


Indonesia 


• 


• 




• 


Iran, Islamic Republic of 


• 




• 


• 


Israel 


• 


• 




• 


Jordan 




• 




• 


Kazakhstan 






• 




Kyrgyz Republic 




• 






Kuwait 


• 




• 


• 


Latvia 


• 


• 


• 




Lebanon 








• 


Liechtenstein 




• 






Lithuania 


• 


• 


• 


• 


Macedonia 


• 








Macao-China 




• 






Malaysia 








• 


Malta 








• 


Moldova, Republic of 


• 








Montenegro, Republic of 




• 






Morocco 


• 




• 




Oman 








• 


Palestinian National Authority • 


Qatar 


• 


• 


• 


• 


Romania 


• 


• 




• 


Russian Federation 


• 


• 


• 


• 


Saudi Arabia 








• 


Serbia, Republic of 




• 




• 


Singapore 


• 




• 


• 


Slovenia 


• 


• 


• 


• 


South Africa 


• 








Syrian Arab Republic 








• 


Thailand 




• 




• 


Trinidad and Tobago 


• 








Tunisia 




• 


• 


• 


Ukraine 






• 


• 


Uruguay 




• 






Yemen 






• 




Non-OECD country total 


20 


27 


20 


36 


Total jurisdictions 


45 


57 


36 


48 



1 Four Canadian provinces (Alberta, British Columbia, Ontario, and Quebec), as well as the Basque region of Spain, Dubai of the United 
Arab Emirates, and Massachusetts and Minnesota of the United States, participated in TIMSS 2007 as benchmarking participants and are 
not included in the total counts shown. 

2 Although Mongolia and Morocco participated at both grades, the quality of the data for Mongolia was not well documented at both 
grades and there was a problem with the participation rates for Morocco at the eighth grade. For more information, see Olson, J.F., Martin, 
M.O., & Mullis, I.V.S. (Eds.). (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. 

3 There are a total of 30 countries in the Organization for Economic Cooperation and Development (OECD). An OECD country is counted 
in the OECD total if it participated as a single/entire country (which is generally the case) or if it participated as one or more component 
jurisdictions of the country (e.g., England and Scotland as representing the United Kingdom). 

4 PISA 2006 reading literacy results were not reported for the United States because of an error in printing the test booklets. 

NOTE: A bullet indicates participation in the particular assessment. An open bullet "o" indicates jurisdictions that participated as 'bench- 
marking participants." 

SOURCE: International Association for the Evaluation of Educational Achievement (IEA). Progress in International Reading Literacy Study 
(PIRLS), 2006; IEA, Trends in International Mathematics and Science Study (TIMSS), 2007; Organization for Economic Cooperation and Devel- 
opment (OECD), Program for International Student Assessment (PISA), 2006. 



Special Supplement to The Condition of Education 5 



Are International Assessment 
Results Reliable, Valid, and 
Comparable? 

Since the United States began participating in 
comparative international assessments in the 1960s, the 
number and scope of international assessments have 
grown. In addition, the quality of the data they collect has 
improved because of the international adoption of ever 
more rigorous technical standards and monitoring, along 
with growing expertise in the international community 
relating to assessment design (National Research Council 
2002, p. 9). The international organizations that sponsor 
international student assessments — the OECD and 
the International Association for the Evaluation of 
Educational Achievement (IEA) — go to great lengths to 
ensure that their assessment results are reliable, valid, and 
comparable among participating countries. 11 

For each study, the sponsoring international organization 
verifies that all participating countries select a nationally 
representative sample of schools and, from those schools, 
randomly select either classrooms of a particular grade 
or students of the particular age or grade targeted by the 
assessment. To ensure comparability, target grades or ages 
are clearly defined. For example, in TIMSS, at the upper 
grade level, countries are required to sample students in 
the grade that corresponds to the end of 8 years of formal 
schooling, providing that the mean age of the students 
at the time of testing is at least 13.5 years. Moreover, 
comparisons by age are carefully chosen to ensure 
that students at the target age are enrolled in school at 
comparable rates across countries. For example, PISA 
elected to study 15-year-old students because 15 is the 
oldest age at which enrollment rates remain around 90 
percent or higher in most developed countries, including 
the United States (OECD 2008, table C2.1). For students 
16 and older, attendance is not universally compulsory. 

Not all selected schools and students choose to participate 
in the assessment; and certain students, such as some 
with mental or physical disabilities, may not be able to 
take the assessment. Thus the sponsoring international 
organizations check each country’s participation rates 
(for schools and students) and exclusion rates (at the 
school level and within schools) to ensure they meet 
established target rates in order for the country’s results 
to be reported. 12 

In addition to international requirements and verification 
to ensure valid samples, the sponsoring international 
organizations require compliance with standardized 



11 For complete details on the methods instituted to ensure data qual- 
ity and comparability, see OECD 2008; Martin et al. 2007; and Olson, 
Martin, and Mullis 2008. 

12 The United States also conducts its own nonresponse bias analysis if 
school participation rates are below 85 percent. For more details about 
nonresponse bias analysis, see appendix A. 



procedures for the preparation, administration, and 
scoring of assessments. Countries are required to send 
quality-control monitors to visit schools and scoring 
centers to report on compliance with the standardized 
procedures. Furthermore, independent international 
quality-control monitors visit a sample of schools in each 
country to ensure that the international standards are met. 

Results for countries that fail to meet the required 
participation rates or other international requirements 
are footnoted with explanations of the specific failures 
(e.g., “only met guidelines for sample participation 
rates after substitute schools were included”), shown 
separately in the international reports (e.g., listed in a 
separate section at the bottom of a table), or omitted 
from the international reports and datasets (as happened 
to The Netherlands’ PISA results in 2000, the United 
Kingdom’s PISA results in 2003, and Morocco’s TIMSS 
2007 results at grade 8). For more details on international 
requirements, see appendix A. 

Every participating country is involved in a thorough 
process of developing the assessment. The national 
representatives from each country review every test item 
to be included in the assessment to ensure that each item 
adheres to the internationally agreed-upon framework (the 
outline of the topics and skills that should be assessed in 
a particular subject area), and that each item is culturally 
appropriate for their country. Each country translates the 
assessment into their own language or languages, and 
external translation companies independently review each 
country’s translations. 

A “field test” (a small-scale, trial run of the assessment) is 
then conducted in the participating countries to see if any 
items were biased because of national, social, or cultural 
differences. Statistical analyses of the item data are also 
conducted to check for evidence of differences in student 
performance across countries that could indicate a 
linguistic or conceptual translation problem. Problematic 
items may be dropped from the final pool of items or 
scaled differently. 

When this process is complete, the main assessment 
instruments are created. Each assessment “instrument” 
consists of the instructions, the same number of “blocks” 
of items (each block is a small set of selected items from 
the final pool of items), and a student background 
questionnaire. (Additional questionnaires are often 
prepared and administered to the students’ teachers, 
parents, and/or school principal.) The instruments are 
then administered to the sampled students in each of the 
participating countries at comparable times. 

For more details on the development and administration 
of the international assessments, see the Technical 
Reports produced for each assessment. 



6 Special Supplement to The Condition of Education 2009 



How Do U.S. Students Compare 
With Their Peers in Other Countries? 



U.S. Students Compared to Peers in Other Countries 



This section presents key findings from PIRLS, PISA, and 
TIMSS and is organized, by subject area, into three parts: 
reading, mathematics, and science. For each subject area, 
the assessments in that subject are described and their 
similarities and differences are highlighted. Then for each 
assessment in that subject, 

■ the U.S. average (mean) student score is compared 
with those of the other participating countries; 

■ the threshold or cutpoint score for “the top 10 
percent” of U.S. students (technically the score of 
U.S. students at the 90th percentile) is compared 
with the cutpoint score for the top 10 percent of 
students in other countries; 

■ the cutpoint score for the bottom 10 percent of U.S. 
students is compared with the cutpoint score for the 
bottom 10 percent of students in other countries; 

■ the percentage of students reaching the highest 
international benchmark or highest level of 
proficiency set by each assessment is compared; and 

■ changes in these measures over time for the 
United States and for the top-scoring countries are 
examined, when possible. 

These data are described to provide a broader 
understanding of the performance of U.S. students 
compared to their peers around the world than is gained 
by just knowing average scores. Specifically, knowing 
the cutpoint scores for the top and bottom 10 percent 
of students tells us how well the highest and lowest 
performing students do in each country and how 
wide a range there is in student performance within 
each country. This range, in turn, provides important 
contextual information to understand whether a country 
that outperforms the United States scores higher on 
account of the performance of its students overall, of 
mostly its top-performing students, or of mostly its 
low-performing students. In contrast, comparing the 
percentage of students who reach the same international 
benchmarks or levels of proficiency provides information 
on the extent to which a country’s education system 
brings student performance up to standardized levels that 
have been internationally established. 

After these data have been described for each assessment, 
you will find references for more detailed information 
and a brief synthesis of all the assessment results in the 
subject area. 



Reading 

Both PIRLS and PISA assess aspects of reading skills, 
but they differ in terms of whom they assess and what 
they assess. 

PIRLS assesses 4th-graders and is designed to reflect 
the curriculum of participating countries. PIRLS asks 
students to read two texts — either two literary texts 
(narrative fiction, generally drawn from children’s 
books), two informational texts (typically excerpts from 
biographies, step-by-step instructions, or scientific or 
non-fiction materials), or one of each type. It then asks 
students about a dozen questions (both multiple-choice 
and open-ended “constructed response”) about the 
texts that range from identifying the place, time, and 
actions of the main characters or events to interpreting 
how characters might feel, why events occurred, or what 
the passage means overall (e.g., does the story teach a 
lesson?). 13 

PISA assesses 15-year-old students and does not explicitly 
focus on curricular outcomes; rather it focuses on 
cognitive skills and the application of reading to problems 
within a real-life context. Thus it presents students 
with a range of texts that they are likely to encounter as 
young adults, such as excerpts from government forms, 
brochures, newspaper articles, instruction manuals, 
books, and magazines. For each text, it then usually asks 
each student 3-5 questions (both multiple choice and 
constructed response) to measure the extent to which 
students can retrieve information, interpret a text, reflect 
on a text, and evaluate its author’s rhetorical choices. 14 
In years when PISA focuses on reading, students receive 
between 12 and 24 reading texts (depending on the 
particular cluster of items in their particular test booklet); 
when PISA focuses on mathematics or science, students 
receive about 7 reading texts. 

Reading results for 4th-graders 

In PIRLS 2006, the average U.S. 4th-graders’ reading 
literacy score (540) was above the PIRLS scale average 
of 500, but below that of 4th-graders in 10 of the 45 
participating countries, including 3 Canadian provinces 
(Russian Federation, Hong Kong, Alberta, British 
Columbia, Singapore, Luxembourg, Ontario, Hungary, 
Italy, and Sweden) 15 (table 2). The top 10 percent of U.S. 
4th-graders scored 631 or higher, a cutpoint score below 
that of the top 10 percent of students in 8 countries. 

The bottom 10 percent of U.S. 4th-graders scored 441 
or lower, a cutpoint score below that of the bottom 10 
percent of students in 13 countries. 



13 Examples of PIRLS items can be viewed at http://nces.ed.gov/ 
pubs2008/2008017 2.pdf . 

14 Examples of PISA reading items can be viewed at http://www.oecd.org/ 
dataoecd/30/ 17/39703267.pdf, pages 288 to 291. 

15 Countries are listed in rank order from highest to lowest score for coun- 
tries outperforming the United States. 



8 Special Supplement to The Condition of Education 2009 



Table 2. Average PIRLS scores of fourth-grade students on reading literacy scale and cutpoint scores for bottom and 
top 10 percent of students in each jurisdiction, by jurisdiction: 2006 



Jurisdiction 


Average score 




Bottom 1 0 percent 


Cutpoint score 


Top 1 0 percent 




All jurisdictions 


500 


® 


398 


® 


597 


® 


Russian Federation 


565 


O 


474 


O 


649 


O 


Hong Kong, SAR 1 


564 


O 


486 


O 


637 




Canada, Alberta 


560 


o 


472 


o 


645 


O 


Canada, British Columbia 


558 


o 


467 


o 


645 


o 


Singapore 


558 


o 


456 




652 


o 


Luxembourg 


557 


o 


470 


o 


641 


o 


Canada, Ontario 


555 


o 


463 


o 


644 


o 


Hungary 


551 


o 


459 




637 




Italy 


551 


o 


462 


o 


637 




Sweden 


549 


o 


465 


o 


627 




Germany 


548 




463 


o 


627 




Belgium (Flemish) 2 


547 




474 


o 


616 


® 


Bulgaria 


547 




437 




647 


o 


Netherlands 2 


547 




478 


o 


613 


® 


Denmark 


546 




454 




629 




Canada, Nova Scotia 


542 




442 




634 




Latvia 


541 




460 


o 


619 


® 


United States 2 


540 




441 




631 




England 


539 




423 


® 


645 


o 


Austria 


538 




454 




617 


® 


Lithuania 


537 




461 


o 


608 


® 


Chinese Taipei 


535 




451 




613 


® 


Canada, Quebec 


533 




450 




611 


® 


New Zealand 


532 


® 


415 


® 


637 




Slovak Republic 


531 


® 


433 




617 


® 


Scotland 2 


527 


® 


420 


® 


624 




France 


522 


® 


433 




605 


® 


Slovenia 


522 


® 


427 


® 


608 


® 


Poland 


519 


® 


417 


® 


612 


® 


Spain 


513 


® 


420 


® 


599 


® 


Israel 


512 


® 


369 


® 


626 




Iceland 


511 


® 


417 


® 


594 


® 


Belgium (French) 


500 


® 


409 


® 


585 


® 


Moldova, Republic of 


500 


® 


406 


® 


584 


® 


Norway 3 


498 


® 


409 


® 


579 


® 


Romania 


489 


® 


362 


® 


597 


® 


Georgia 


471 


® 


369 


® 


565 


® 


Macedonia, Republic of 


442 


® 


305 


® 


571 


® 


Trinidad and Tobago 


436 


® 


295 


® 


563 


® 


Iran, Islamic Republic of 


421 


® 


295 


® 


539 


® 


Indonesia 


405 


® 


301 


® 


504 


® 


Qatar 


353 


® 


228 


® 


479 


® 


Kuwait 


330 


® 


186 


® 


476 


® 


Morocco 


323 


® 


181 


® 


468 


® 


South Africa 


302 


® 


141 


® 


500 


® 



0 Score is higher than U.S. score. 

® Score is lower than U.S. score. 

1 Hong Kong is a Special Administrative Region (SAR) of the People's Republic of China. 

2 Met Progress in International Reading Literacy Study (PIRLS) guidelines for sample participation rates only after substitute schools were 
included. 

3 Did not meet guidelines for sample participation rates after substitute schools were included. 

NOTE: Jurisdictions are ordered on the basis of average scores, from highest to lowest. Reading literacy scores are reported on a scale 
from 0 to 1,000. A cutpoint score is the threshold score for an established level of performance. The cutpoint scores for students in the top 
10 percent is the 90th percentile score within the jurisdiction. The cutpoint score for students in the bottom 10 percent is the 10th percentile 
score within the jurisdiction. The tests for significance take into account the standard error for the reported difference. Thus, a small 
difference between the United States and one country may be significant while a large difference between the United States and another 
country may not be significant. 

SOURCE: Mullis, I.V.S., Martin, M.O., Kennedy. A.M., and Foy, P. (2007). PIRLS 2006 International Report: lEA's Progress in International Reading 
Literacy Study in Primary Schools in 40 Countries, exhibit 1.1. Chestnut Hill. MA: TIMSS & PIRLS International Study Center, Boston College: and 
International Association for the Evaluation of Educational Achievement (IEA). Progress in International Reading Literacy Study (PIRLS), 2006, 
unpublished tabulations (November 2008). 



Special Supplement to The Condition of Education 9 



PIRLS has developed four international benchmarks 
to help analyze the range of students’ performance in 
reading within each participating country, with the 
highest, or Advanced, benchmark set at 625 score points. 16 
For PIRLS 2006, students reaching the Advanced 
benchmark could interpret figurative language; integrate 
ideas across a text to provide interpretations of a 
character’s traits, intentions, and feelings; and provide full 
text-based support for their interpretations. 17 

In 2006, twelve percent of U.S. 4th-graders reached 
this benchmark (figure 1). Eight participating countries, 
including 3 Canadian provinces, had a higher percentage 
of 4th-graders reaching this benchmark, ranging 
from 19 to 15 percent: Singapore, Russian Federation, 
Alberta, Bulgaria, British Columbia, Ontario, England, 
and Luxembourg. Among the countries with a greater 
percentage of students than the United States reaching 
the Advanced benchmark, two did not have average 
student scores higher than the United States: Bulgaria 
and England. 18 

Change over time 

Among the 28 countries that participated in both the 
2001 and 2006 PIRLS assessments, the average reading 
literacy score increased in 8 countries and decreased 
in 6 countries (figure 2). In the rest of these countries, 
including the United States, there was no measurable 
change in the average reading literacy score between 
2001 and 2006. The number of these countries that 
outperformed the United States increased from 3 in 2001 
to 7 in 2006. 19 Three of the countries that outperformed 
the United States in 2006 (Hong Kong, the Russian 
Federation, and Singapore) had scored below the 
United States, on average, in 2001. In contrast, in 2 of 
the 6 countries where 4th-graders showed measurable 
declines (England and The Netherlands), 4th-graders 
outperformed their U.S. peers in 2001, but were not 
measurably different than their U.S. peers in 2006. 

PIRLS will be offered again in 2011. Results from the 
PIRLS 2006 assessment can be found in Baer et al. (2007; 



16 See figure 1 for the cut scores established for the other three interna- 
tional benchmarks. For details about all the international benchmarks, see 
Mullis et al. (2007), chapter 2. 

17 The IEA set international benchmarks for PIRLS based on an analysis of 
score points. The score points for each benchmark remain the same across 
assessments; however, the configuration of items that define what students 
reaching a benchmark can do may vary slightly from one assessment to the 
next. For more details on the IEA’s benchmarks and how they differ from 
PISA’s levels of proficiency, see appendix A. 

18 There was no measurable difference between the average student scores 
in the United States and in Bulgaria and England. 

19 Luxembourg and the Canadian provinces of Alberta and British 
Columbia also outperformed the United States in 2006, but they did not 
participate in 2001. 



available at http://nces.ed.gov/pubsearch/pubsinfo. 
asp?pubid=2008017 ) and Mullis et al. (2007; available at 
http://timss.bc.edu/pirls2006/intl rpt.html ). For more 
information on PIRLS, see http://nces.ed.gov/surveys/pirls/ . 

Reading results for 15-year-olds 

PISA 2006 reading literacy results are not reported for the 
United States because of an error introduced when the test 
booklets were printing. 20 Thus the reading literacy results 
described here come from the PISA 2000 and 2003. 

In PISA 2003, U.S. 15-year-old students’ average reading 
literacy score of 495 was not measurably different than 
the OECD average of 494, and placed U.S. 15-year-olds 
in the middle third of participating OECD nations 
(table 3). Fifteen-year-old students in 9 of the 29 other 
participating OECD-member countries outperformed 
their U.S. peers (as did 15-year-olds in 2 of the 11 
non-OECD countries that participated) in terms of 
average scores. U.S. 15-year-olds in the top 10 percent 
scored 622 or higher, a cutpoint score below that of the 
top 10 percent of students in 7 countries (all OECD 
countries). The bottom 10 percent of U.S. 15-year-olds 
scored 361 or lower, a cutpoint score below that of the 
bottom 10 percent of students in 9 OECD countries and 
3 non-OECD countries. 

PISA has developed five levels of proficiency to help 
analyze the range of students’ performance in reading 
within each participating country. 21 The highest level 
of proficiency identifies students who can complete 
sophisticated reading tasks, such as managing information 
that is difficult to find in unfamiliar texts; showing 
detailed understanding of such texts and inferring which 
information in the text is relevant to the task; and being 
able to evaluate critically and build hypotheses, draw on 
specialized knowledge, and accommodate concepts that 
may be contrary to expectations. For PISA 2003, the 
highest level of proficiency corresponds with a score at or 
above 625 score points. 22 



20 In various parts of the U.S. PISA 2006 reading literacy assessment test 
booklet, students were incorrectly instructed to refer to the passage on the 
“opposite page” when students actually needed to turn back to the previ- 
ous page to see the necessary passage. 

21 See figure 3 for the cut scores for all five levels of proficiency in 2003. For 
details about all five levels, see OECD 2004, pp. 272-79. 

22 PISA has defined levels of proficiency based on specific student proficien- 
cies. These specific student proficiencies remain the same across assess- 
ments; however, the score point threshold for students who demonstrate 
these specific student proficiencies may vary slightly from assessment to 
assessment. For more details on PISA’s levels of proficiency and how they 
differ from the IEA’s benchmarks, see appendix A. 



1 0 Special Supplement to The Condition of Education 2009 



